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Abstract 

The hairpin completion is an operation on formal languages that has 
been inspired by the hairpin formation in DNA biochemistry and by DNA 
computing. In this paper we investigate the hairpin completion of regular 
languages. 

It is well known that hairpin completions of regular languages are 
linear context-free and not necessarily regular. As regularity of a (lin- 
ear) context-free language is not decidable, the question arose whether 
regularity of a hairpin completion of regular languages is decidable. We 
prove that this problem is decidable and we provide a polynomial time 
algorithm. 

Furthermore, we prove that the hairpin completion of regular lan- 
guages is an unambiguous linear context-free language and, as such, it 
has an effectively computable growth function. Moreover, we show that 
the growth of the hairpin completion is exponential if and only if the 
growth of the underlying languages is exponential and, in case the hair- 
pin completion is regular, then the hairpin completion and the underlying 
languages have the same growth indicator. 

Keywords: Hairpin completion, regular languages and finite automata, 
unambiguous linear languages, rational growth 

1 Introduction 

A DNA strand can be seen as a word over the four-letter alphabet {A, C, G, T} 
where the letters represent the nucleobases Adenine, Cytosine, Guanine, and 
Thymine, respectively. By Watson-Crick base pairing two strands may bond 
to each other if they have opposite orientation and their bases are pairwise 
complementary, where A is complementary to T and C to G; see Fig. [1] for 
a graphic example. Throughout the paper we use the bar-notation for the 
Watson-Crick complement and its language theoretic pendant, i. e., A = T and 
C = G. For base sequences (or words) we let oi • • • am = 5™ • • ■ oT; thus, is an 
antimorphic involution. 

The polymerase chain reaction (PCR) is an technique which is often used 
in DNA computing to amplify a template strand or a fragment of the template 
strand. Short DNA sequences, so-called primers, bond to a part of the template 
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5'-C-G-G-T-T-T-C-A-T-C-A-3' 

I I I I I I I I I I 
3'-G-C-C-A-A-A-G-T-A-G-T-5' 



Figure 1: Bonding of two strands: The strands are base- wise complementary and 
the first strand has 5'-to-3' orientation whereas the second strand has 3'-to-5' 
orientation. 



and thusly select where the extension, the process where template is comple- 
mented, will start. 

The hairpin completion of a strand can naturally develop during the PGR. 
Suppose a strand a can be written as ti = ja/ia. Therefore, its sufhx a can 
act as a primer to the strand and form an intramolecular base-pairing which 
is known as hairpin formation. After the extension process we obtain a new 
strand 'ya/iaj which we call a hairpin completion of a; see Fig. [21 Referring to 
[27], a should consist of at least 9 bases, otherwise the bond between a and a 
is too weak. 



hairpin completion 



a /? ^ 



a /? _ ^ 



extension 



-^1 



Figure 2: Hairpin completion of a strand or a word. 



Hairpin completions are often seen as undesirable byproducts that occur 
during DNA computations and, therefore, sets of DNA strands have been in- 
vestigated that do not tend to form hairpins or other undesired hybridizations, 
see e. g., [4j [3 [8l [131 HI] and the references within. On the other hand, DNA 
algorithms have been designed that make good use of hairpins and hairpin com- 
pletions. For example, the whiplash PGR is a technique where a single DNA 
strand computes one run of a non-deterministic GOTO-machine by repetitive 
hairpin completions, where the length of the extended part is controlled by stop- 
per sequences. Starting with a huge set of strands, all runs of such a machine 
can be computed in parallel. Whiplash PGR can be used to solve NP-complete 
problems like the Hamiltonian path problem [101 EH [ZH] ■ 

Motivated by the hairpin formation in biochemistry, the hairpin completion 
of formal languages has been introduced in 2006 by Gheptea, Martin- Vide, and 
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Mitrana [3] . This paper continues the investigation of hairpin formation from 
a purely formal language theoretical viewpoint. The hairpin completion of lan- 
guages Li and L2 contains all right hairpin completions (as in Fig.[2|) of all words 
in Li and all left hairpin completions (a word af3aj is extended to the left by 
7) of all words in L2- A formal definition of this operation is given in Sect. 12.11 
The hairpin completion and some related operations have been investigated in 
a series of papers, see e. g., [HI [H [li [H [Ml [H [Hi- 
lt is known from [3] that the hairpin completion of regular languages is not 
necessarily regular but it is always linear context-free. As regularity of a linear 
context-free language (given as grammar) is undecidable, the question arose if 
regularity of the hairpin completion of regular languages can be decided. This 
question was first posed in 2006 |2- We answered this question positively at 
ICTAC 2009 [6l when we proved that the problem is decidable in polynomial 
time. In this first approach we were not precise about the degree of the polyno- 
mial; it was about 20. In a later approach, which was presented at CIAA 2010 
[S], we improved the decision algorithm and provided, that the problem is solv- 
able in 0{n^), where n bounds the size of the two input DFAs (deterministic 
finite automata), accepting Li and L2, respectively. Furthermore, for L2 ~ % 
we provided a time complexity of O(n^) and for Li = L2 we provided 0{n^). 
In the second paper we also showed that the problem is NL-complete (NL is 
the class of problems that are solvable by a non-deterministic algorithm using 
logarithmic space), in particular, the problem is contained in Nick's Class which 
means it is efficiently solvable in parallel, see e.g., [23l. Moreover, we proved 
that the hairpin completion of regular languages has an unambiguous linear rep- 
resentation. Thus, its generating function is an effectively computable rational 
function. 

This paper is organized as follows. In Sect. [51 we formally define the hairpin 
completion operation, we lay down our notation, and we briefly introduce the 
concepts of formal language theory that we will use later. Then, we start our 
investigation of hairpin completions of regular languages, in Sect.[3l by providing 
an unambiguous linear grammar generating the hairpin completion of two given 
regular languages. Sect. [H is devoted to the polynomial time algorithm that 
decides the regularity of the hairpin completion of regular languages. In the final 
chapter, Sect. [5l we discuss the relation of the growth of the hairpin completion 
with the growths of the underlying regular languages. 

This paper is the journal version of results that have been presented at 
ICTAC 2009. It uses the improvements which were presented at CIAA 2010 
and it contains some additional results. 

2 Preliminaries and Notation 

We assume the reader to be familiar with the fundamental concepts of formal 
language theory and automata theory, see [TT] . 

By E we denote a finite alphabet with at least two letters which is equipped 
with an involution : E — E. An involution for a set is a bijection such that 
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a = a for all a G E. (In a biological setting we may think of E = {A, C, G, T} 
with A — T and C = G.) We extend this involution to words ai • • • a„ by 
ai • • • a„ = • • • oi. (Just like taking inverses in groups.) For languages L 
denotes the set {w \ w e L}. The set of words over E is denoted E*; and the 
empty word is denoted by 1. By E-™ we mean the set of all words with length 
at most TO. 

Given a word w, we denote by \w\ its length, by w[i] € E its i-th letter, and 
by w[i,j] we mean u'[«]u'[i + !]••• w[j]. If = xyz for some a;, y, z S E*, then x 
and z are called prefix and suffix, respectively. A prefix or suffix a; of w is said 
to be proper if x ^ w. The (proper) prefix relation between words x and w is 
denoted hy x < w (respectively, x < w). 

2.1 Haiprin completion 

Let Li and L2 be languages in E*. By k we denote a (small) constant that 
gives a lower bound for the length of primers. We define the hairpin completion 



n.{L,,L2) by 

Hk(Li, L2) = {'ya(3aj \ {'^a/3a S Li V a/3aj e L2) A |a| > k} . 



Three cases are of main interest: 

1. ) il =i2, 

2. ) Li = L2, and 

3. ) Li = or L2 = 0. 

Compared to the definition of the hairpin completion in [3l[2T] case 1 corresponds 
to the the two-sided hairpin completion and case 3 to the one-sided hairpin 
completion. In many biochemical applications a strand and its complement 
always co-occur, thus, the assumption Li — Li = L2 is natural, too, and it is a 
covered by case 2. 

2.2 Linear Context-free Grammars and Unambiguity 

A grammar G is a tuple G — {V,Y,, P,S) where V is the finite set of non- 
terminals, E is the alphabet (the set of terminals), P is the finite set of pro- 
duction rules, and 5 C is the set of axioms. (Note that we allow a set of 
axioms rather than the more usual restriction to have exactly one axiom 5*.) A 
grammar is called context-free, if every rule in P is of the form A ^ w where 
A G V and w £ {V U E)*; a grammar is called linear context-free, or simply 
linear, if, in addition, w contains at most one non-terminal. For a context-free 
grammar G, a derivation step is denoted by uAv uwv, where A — > w is a 

production rule in P and u,v € (V U E)*. By =>, we denote the reflexive and 
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transitive closure of and we call u =4> v (with u,v G (V^ S)*) ^ derivation. 

G G 

The language generated by G is the set of terminal words 



L(G) = |weS* 3AeS:A==^wY 



A linear grammar G is said to be unambiguous if for every word w G L{G), 
there is exactly one derivation A =^ w where A £ S] in particular, there is only 

G 

one axiom A that derivates w. (For general context-free grammars we would 
require that there is exactly one left-most derivation A =k> w; but in case of 

linear grammars, these definitions coincide.) 

A language L is called (unambiguous) linear if it is generated by an (unam- 
biguous) linear grammar. 

2.3 Generating Functions 

For a profound discussion of formal power series and how the growth of regular 
and unambiguous linear languages can be calculated we refer to [U [2l [9l [17] . 
We content ourselves with a few basic facts. The growth or generating function 
(7l of a formal language L is defined as 

m>0 

We can view 51, as a formal power series or as an analytic function in one 
complex variable where the radius of convergence is strictly positive. The radius 
of convergence is at least 1 / | S | . 

It is well-known that the growth of a regular language L is effectively rational, 
i. e., it is a quotient of two polynomials, which can be effectively calculated. The 
same is true for unambiguous linear languages as soon as we know a generating 
unambiguous linear grammar. In particular, the growth is either polynomial 
or exponential. If the growth is exponential, then there exists an algebraic 
number Xl € M-", its growth indicator, such that |Ln behaves essentially 
as A™. More precisely, for a language L, its growth indicator is defined as the 
non- negative real number Al where 

Xl = inf {A e | 3c > 0,Vm e N: |Ln S™| < cA™} . 

The growth of a language L is 

1. ) exponential if 1 < Al < 

2. ) sub-exponential but infinite if Xl — 1, and 

3. ) finite if Al = 0. 

Note that other values for Al do not occur and that Al is the inverse of the 
convergence radius of gL{z). As we discussed above, the growth of an unam- 
biguous linear language L is either polynomial or exponential; thus, if Al = 1, 
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the growth of L can be considered polynomial. Note that regular languages of 
polynomial growth have a very restricted form: It is well known that a regular 
language has polynomial growth if and only if it can be written as a finite union 
of languages of the form uou\u2 ■ ■ ■ M2fe-i^2fe where Ui are words, see e. g., pS] . 
Thus, the more interesting situation occurs when a language has exponential 
growths. It is then when the growth indicator becomes significant. 



2.4 Regular Languages and Finite Automata 

Regular languages can be specified by non-deterministic finite automata (NFA) 
A = {Q,Yi,E,X,F), where Q is the finite set of states, X C Q is the set of 
initial states, and C Q is the set of final states. The set E contains labeled 
transitions (or arcs), it is a subset of Q x S x Q. For a word w G S* we write 
p — ^ q, if there is a path from state p to q which is labeled by w. Thus, the 
accepted language becomes 



L{A) = e S* 



3p el, 3q e T : p 



Later it will be crucial to use also paths which avoid final states. For this 
we introduce a special notation. First remove all arcs (j>, a, q) where 5 S is a 
final state. Thus, final states do not have incoming arcs anymore in this reduced 
automaton. Let us write p =^ q, if there is a path in this reduced automaton 
from state p to q which is labeled by the word w. Note that for such a path 
p =^ q we allow p e J-, but on the path we never meet any final state again. 

An NFA is called a deterministic finite automaton (DFA), if it has one initial 
state and for every state p € Q and every letter a e E there is exactly one arc 
{p, a, q) G E. In particular, a DFA in this paper is always complete, thus we 
can read every word to its end. We also write p ■ w — q, it p — ^ q. This yields 
a (totally defined) function Q x E* — > Q, which defines an action of E* on Q 
on the right. 



2.5 Notation 

Throughout the paper, Li and L2 denote fixed regular languages in E*. We 
use a DFA accepting Li as well as a DFA accepting L2, which works from 
right-to-left. However, instead of introducing this concept we use a DFA (work- 
ing as usual from left-to- right), which accepts L2- This automaton has the 
same number of states (and is structurally isomorphic to) as a DFA accept- 
ing the reversal language of L2. Our input is therefore given by two DFAs 
Ai — {Qi,T,,Ei,{qoi},J^i) for i — 1,2 which accept the languages Li and L2, 
respectively. We let ni = \Qi\, n2 = IQ2I, and we let n = max{ni,n2} be the 
input size. 
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3 Unambiguity of 7/k(Li, L2) 

In this section we prove that the hairpin completion ^^(ii, L2) is an unambigu- 
ous hnear context-free language. The result is not needed for deciding regularity 
of Hf^iLi, L2), but it came out as a byproduct of the decision procedure. How- 
ever, the result turned out to be rather fundamental for the understanding of 
hairpin completions of regular languages, in general. In particular, it allows to 
compute the growths oiHniLi, L2) and to compare it with the growths of the 
languages Li and L2, see Sect. [51 Moreover, ideas of this section, will be reused 
when we provide the algorithm deciding the regularity of ^{^{Li, L2). Therefore 
we begin with the following result. 

Theorem 3.1. The hairpin completion is unambiguous linear context-free. More- 
over, there is an effective construction of a generating unambiguous linear gram- 
mar G for T-L^iLi, L2) such that the size of the grammar G is in ©(n^rij) C 

Proof. The basic observation is that every word tt G "Hk (-^1,-^2) has a unique 
factorization tt — ja/Saj such that 

1. ) ja(3a £ Li or aPoFj e L2, 

2. ) |a| 

3. ) if a prefix of tt belongs to Li, then it is a prefix of 7a/3a, and 

4. ) if a suffix of TT belongs to i2, then it is a suffix of aPorf. 

In other words, among all factorizations which satisfy the first condition and 
where |a| > k, we choose the factorization where \oi\ = k and the length of 7 
is minimal. In such a factorization we call ja < n the minimal gamma-alpha- 
prefix of TT. This factorization yields runs in the DFAs Ai and A2 as in Fig. |3l 
(Recall that A2 accepts L2 and tt = ja/Saj.) As tt determines the factors 7 
and a, the states Ci, di, Ci, fi, and q'^ (for z = 1, 2) are determined by tt as well. 

Ai : qoi — > Cl — > di — > ei — > fi 

A2 : 902 > C2 > d2 > 62 > J2 =^ ^2 

Figure 3: The runs defined by tt S 'H^{Li, L2) where 70; is the minimal gamma- 
alpha-prefix and, therefore, fi £ T\ or /2 G J^2- 

Vice versa, every path of this form (where |a| — k) defines one word tt = 
7a/357 from the hairpin completion 'Hi^{Li, L2) such that 7a is its minimal 
gamma-alpha-prefix. 

By this observation, we can use quadruples of states in order to define the un- 
ambiguous linear grammar G that generates the hairpin completion T-L^iLi, L2). 
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For every (pi,p2, gi, 92) G Qi x Q2 x Qi x Q2 we define a regular language 

B{pi,P2,qi,q2) = {w e s* I Pi • w = gi • = 92} • 

Thus, in Fig. [3] we have tt e B{qoi,qo2,q'i,q'2), af3a e B{ci,C2, fi, f2), and 
P G B{di,d2, 61,62)- Later a tuple (pi,p2, <7i, 92) with _B(pi,p2, <7i, 92) 7^ is 
called a &asic bridge, see Sect. 14.11 

Furthermore, in our grammar G, we let B{pi,p2,qi,q2) be the non-terminal 
that derives all words from the language i?(pi,p2, <Zi, 92)- Compared to Fig. [3l 
we intend that B{di,d2, 61,62) =^ /3. In order to achieve this, it suffices to 

G 

introduce the production rules 

B{pi,P2,qi,q2 ■ a) aB{pi ■ a,p2,qi,q2), 

B{pi,P2,Pl,P2) 1 

for pi,qi € Qi, P2,92 G Q2, and a G S. Observe that every derivation from 
B[pi,p2,qi,q2) to a terminal word must use the rule B(qi,p2,qi,P2) — >■ 1 as 
last step. Thus, B{pi,p2,qi,q2) => w implies pi ■ w — qi and p2 ■ w = q2 a,s 

G 

desired. Furthermore, for all words w € B{pi,p2,qi,q2) and all factorizations 
w = uv (i. e., (pi ■ u) ■ V — qi and {p2 • • u = (72) there is a derivation 

B{pi,p2,qi,q2) =^uB{pi ■ u,p2,q2,P2 ■ v) => uv B{qi,p2,qi,P2) => uv 

G G G 

where the non-terminal reached after |u| steps is determined. We conclude, the 
non-terminal B{pi,p2,qi,q2) derives all words from the language B{pi,p2,qi,q2) 
and the derivation of each word is unambiguous. 

The linear context-free part of the grammar G are the derivations of the min- 
imal gamma-alpha-prefixes and the corresponding suffixes. In a similar man- 
ner as above, for every quadruple {pi,p2,qi,q2) G Qi x Q2 x Qi x Q2 we let 
R{Pi:P2,qi,q2) be a non-terminal in G and for pi,qi G Qi, P2,q2 G Q2, and 
a G S we define a rule 

R{pi,P2,qi ■a,q2-a)~>- a R{pi ■ a,p2 ■ a, 91,92) a 

if Qi • a ^ J'l and q2 ■ a ^ T2, and for pi,qi G Qi, P2, 92 G Q2, and a G S'' we 
define a rule 

R{pi,P2,qi ■ a, q2 ■ a) a B{pi ■ a,p2 ■ a, gi, 52) a 

if (/i ■ 5 G J^i or (72 • S G J^2- Observe that the derivations we introduce are again 
unambiguous since on a derivation 

R{qoi,qo2,q'i,q'2) =^ ■uR{pi,P2,qi,q2)u ^ uv R{ci,e2, fi, f2)vu 

G G 

the non-terminal R{pi,p2,qi,q2) is determined by pi — qoi ■ u and qi = fi • v 
(for i = 1,2). Furthermore, the states q'l, q2, qi, and (72 cannot be final states 
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(if V ^ 1) and if /i € Fi or /2 G -7^2, then we have to use a production rule of 
the second form in the next derivation step. 

We conclude, there is a situation as in Fig. |3] if and only if 



R{(ioi,qQ2,q'i-,q'2) =~> 7^(ci,c2,/i,/2)7 

=^ 70; 5(^1,^2, 61,62)07 
G 

=> 70/357 = TT 



and the derivation of tt is unambiguous. Thus, we let R{qQi, qo2, ^i, 92) be the ax- 
ioms in the grammar G for all q'l £ Qi and (?2 € Q2- Since for each word tt there 
exists at most one axiom with R{qoi,qo2, q'l, q'2) such that R{qoi,qo2,qi,q2) ==^ 

G 

TT (namely, q[ — qoi ■ tt and q'2 = qo2 • tt), we see that G is unambiguous linear. 

As for the size of the grammar, observe that the number of non-terminals is 
bounded by 2n\n^ and the number of production rules is bounded by 



4 Polynomial Time Decision Algorithm 

We consider the following decision problem: 

Input: DFAs Ai and A2 (with state sets Qi and Q2) accepting the languages 
Li and i2j respectively. 

The input size is n = max{|Qi| , |Q2|}- 

Question: Is the hairpin completion T-Lk{Li, L2) regular? 

The purpose of this section is to prove the following theorem. 

Theorem 4.1. The problem whether the hairpin completion 'Hk.{Li, L2) is reg- 
ular is decidable in time 

I.) 0{n^) j/ii =0 orL2 = 0. 

it.) 0{n^) ifLi = T2. 

Hi.) 0{n^) in general. 

The algorithm deciding this problem is divided in Test 1,2, and 3. Test 
yields the time performance in case when Li = or L2 = 0, yet it is redundant 
for the other cases. The tests check properties of an automaton A which accepts 
the minimal gamma-alpha-prefixes, introduced in Sect. [3l We will start with 
the construction of A. 




□ 
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4.1 The Automaton A 



The non-deterministic automaton A, we are about to construct, will accept 
those words that are a minimal gamma-alpha-prefix of some word tt = ja/Saj 
and the final states of the automatons will determine from which language 
-6(^1,^2,61,62) we have to choose the factor (3. The construction is analogous 
to the definition of rules for the non-terminals i?(pi,p2, Qi, 92) in Sect. [3] 

In order to improve the time bound in case when Li = L2, we introduce the 
usual product automaton of Ai and A2 with state set 

Q12 = {{pi,P2) e Q I 3w e S* : qoi • w = Pi A go2 • w = P2} 

and operation (^1,^2) ■ w — {pi ■ w,p2 ■ w) for (pi,P2) G Q12 and w £ S*. 
Furthermore, we let ni2 = |Qi2|- Note that if L2 = oi" -^1 ~ L2, then 
ni2 = ni = n and in general n < ni2 < n^. (Recall that = \Qi\ for i = 1, 2.) 

Note first that a non-terminal i?(pi,p2, 9i, 92) is reachable from an axiom 
only if (pi,P2) S Q12; hence, we will consider states from Q12 x Qi x Q2 C 
Qi X Q2 X Qi X Q2 for the construction of A. From now on, we call (pi,P2, 91, 92) 
a basic bridge if i?(pi,p2, 9i, 92) 7^ 0- This notation is due to the fact, that 
there is some word that connects the state pairs {pi,P2) and (gi,(j'2)- It is 
easy to see that in case when (pi,P2, 91, 92) is not a basic bridge, neither the 
non-terminal R{pi,p2, qi, 92) nor the non-terminal B(j>i,p2, Qi, 92) is productive 
in the grammar G. In order to accept the a-factor, we also need levels for 
< ^ < k; hence there are k + 1 levels. By [k] we denote in this paper the set 
{0, . . . , k}. Define 

{((Pi,P2), 91,92,^) e Q12 X Qi X Q2 X [k] I (pi,p2, 91,92) is a basic bridge} 

as the state space of A. For N — ni2nin2 < n'^ the size of A is bounded by 
N ■ {k + 1) £ 0{N) C 0{n^). We have N < for Li = or L2 = 0, and 
N <n^ for L2 = Ti. 

By a slight abuse of languages we call a state ((^1,^2), 91, 92, ^) a bridge. 
Bridges are frequently denoted by (P, gi , 92 , •^) with P = (pi , p2 ) G Q12 , 9i G Qi , 
92 S Q2 , and £ € [k] . Bridges are a central concept in the following. 

The a-transitions in the NFA for a G E are given by the following arcs: 

(P, 9i - a, 92 • 5,0) (P • a, 91, 92,0) for q^-a^ Ti,i^ 1,2, 

(P, 91 - a, 92 • a, 0) (P • a, 91, 92, 1) for 91 • a G J"i or 172 ■ a G -7^2, 

(P, gi - a, 92 • a,l) {P -a, 91,92,^-f 1) for 1 < ^ < k. 

Observe that no state of the form (P, qi, (72, 0) with qi e or 92 G -^2 
has an outgoing arc to level zero; we must switch to level one. There are no 
outgoing arcs on level k, and for each (a, P,qi,q2,£) £ Ex Q12 x Qi x Q2 x [k — 1] 
there exists at most one arc (P, 9^,92,^) (P • a, 91, 92, i?')- Indeed, the triple 
(9i, 92, ^') is determined by (91, 92, ^) and the letter a. Not all arcs exist because 
{P,q[,q'2,i) can be a bridge whereas (P • a, 91, 92, ^') is not. Thus, there are at 
most \T,\ ■ N ■ K £ 0{N) arcs in the NFA. 
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The set of initial states X contains all bridges of the form (Qo, 9i, 0) where 
<9o = (<Zoi: 902)- The set of final states T is given by all bridges (P, gi, 52, '«) on 
level K. 

For an example and a graphical presentation of the NFA, see Fig. 21 



Li = a* (& I 6)a 



*6a 





A: 



(Qo,ti,t2,0) 



a 


(00,P1,/2,1) 


(O0,/l,t2,0) 


a 


(Qo,Pi,t2,l) 


(O0,/l,/2,0) 


a 


(Qo,pi,P2, 1) 








((30,tl,/2,0) 


a 


(Qo,ti,P2,l) 






(Q0,/l.P2,l) 



B{qoi,qo2,Pl, fl) = ab 
B(qoi,qo2,pi,t2) = aa+b \ a'b 
S(qoi,qo2,Pl,P2) = b 
-8(901, 902, tl,P2) = ''50+ 
-8(901,902, /l,P2) = ba 



Figure 4: DFAs for Li and L2 and the resulting NFA A with 4 initial states 
and 5 final states associated to the (linear context-free) hairpin completion 
T-L^,{Li,L2) = a+ha+ U {a'W \ i>3>l} with n = l. 

Next, we show that the automaton A encodes the minimal gamma-alpha- 
prefixes and that we obtain the hairpin completion 'H,^{Li, L2) in a natural way 
from A. For languages B and R we denote by the language 

B" = {vPv \ I3€ BAv€ R}. 

(This notation is adopted from group theory where exponentiation denotes con- 
jugation and the canonical involution refers to taking inverses.) Clearly, if B 
and R are regular, then B^ is linear context-free, but not regular in general. 
Also note that if R is finite, then B^ is regular. 

Lemma 4.2. Let M = X y. T . For each pair fi = {I, F) € M with F = 
((di, (i2), ei, 62, k) let R^ he the (regular) set of words which label a path from 
the initial state I to the final state F, and let B^ = B(di,d2, ei, 62). 
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The hairpin completion 'Hk{Li, L2) is the disjoint union 

Moreover, for fi I x T and for all words l3 G i?^ and v G R^, the minimal 
gamma- alpha-prefix ofv/3v is v. 

Proof. Let tt G T-L^iLi, L2). Let 7a be the minimal gamma-alpha-prefix of tt 
with \a\ = K and factorize tt = ja^aj. There are runs in the DFAs 

Ai : qoi — > ci — > di — > ei — > Ji =^ q^, 

"42 : (702 ^ C2 > 0.2 > 62 > J2 <72 

where /i G J^i or /2 G J-2 (cf. Fig. Recall that all states on these paths are 
determined by tt. 

By the definition of the NFA A, we find a path / A F where 
I = (Qo,gi,'72'0)' ^ = ((ci,C2),/i,/2,0), and F = ((di, ^2), ei, 62, k). As 
P G -8(^1,^2,61,62), there is a unique fi = {I , F) £ X x F with tt G S;^^. 

Conversely, let ^ = {I,F) £ I x F, let /3 G B^, and let / — ^ ^ F 
with |a| = K be a path in A. As is a final state it is on level k and A = 
((ci, C2), /i, /2, 0) is the last state on level zero, whence /i G Fi or /2 G J^2- 
Therefore, we find runs in the DFAs just like above where / — {QQ,q[,q2,0) 
and F — ((di, ^2), ei, 62, k). We conclude 7a is the minimal gamma prefix of 
7Q;/3a7 and 7a/3a7 G "Hk^Xi, i2)- CH 

The next Lemma tells us that the paths in the automaton are unambiguous. 
The arguments are essentially the same as used in Sect. |3l The unambiguity of 
paths will become crucial later. 

Lemma 4.3. Let w £ T,* be the label of a path in A from a bridge A = 
{P,Pi,P2,t) to A' = {P' ,p[,p'2,t), then the path is unique. This means that 
B = B' whenever w — uv and 

A^B^A', A^B'^A'. 

Proof. It is enough to consider u = a G S. Let B = (Q,qi,q2,m). Then we 
have Q = P ■ a and qi ~ p\-v. If £ = and pi ^ Fi for i = 1, 2, then m = 0, too; 
otherwise m — t ^ \. Thus, B is determined by A, A' , and u, v. We conclude 
B = B'. □ 

For the decision algorithm we need to construct the automaton A within 
the time bounds. The automaton can be constructed in time 0{n\n'^)\ that 
is 0{n^) in case Li = or L2 = and 0{n'^) otherwise. Recall that the 
number of states and the number of transitions are in 0{N) C 0{n\n\) and 
that the tuple (a. P, qi. q2,£) G S x Q12 x Qi x Q2 x [k — 1] defines one transition 
{P,qi ■a,q2- a,£) (P ■ a,qi,q2,i') (where i' is determined by £, qi ■ a, and 
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q2 ■ a) if and only if (P • a,qi,q2,£') is a bridge. Thus, it suffices to sliow tliat 
we can compute the set of bridges in time 0{n\n'^). 

Furthermore, at this stage, we compute the set of all a-hridges for a e S, 
where a basic bridge B{di,d2,ei,e2) is called an a-bridge if B{di,d2,ei,e2) n 
aS* 7^ 0. Later, we need the precomputed sets containing all a-bridges. 

Lemma 4.4. The set containing all basic bridges and the sets containing all 
a-bridges for a G S, respectively, can be computed in time 0(71^712)- 

Proof. Consider a transition system with state set Qi x Q2 and transitions 
(pi, 52) — > {qi,P2) for all pi ■ a = qi and P2 ■ a = 92 (we use forward edges in 
Ai and backwards edges in ^2)- Note that there are nin2 ■ |S| transitions and 
the transition system can be constructed in 0{nin2). 

There is a path (^1,92) ~> (91, P2) with w S E* if and only ii pi ■ w = qi 
and P2-W = q2- Thus, a quadruple {pi,p2, Qi, ^2) G Qi x Q2 x Qi x Q2 is a basic 
bridge if and only if a path from (pi,(?2) to (^1,^2) exists and it is an a-bridge 
if and only if such a path exists that starts with an a-transition. 

In order to compute the sets of bridges, we run a depth- first reachability 
search for all triples {pi,q2,a) G Qi x Q2 x E; for each pair {qi,P2) G Qi x Q2 
that is reachable from (pi, 52) by a path starting with an a-transition, we mark 
{pi,P2, 91, 92) as basic bridge and as a-bridge. Since every depth-first search can 
be performed in 0(711712), the whole computation can be done in 0{n\n\). □ 

Remark 4.5. For convenience, we will henceforth assume that all states in the 
automaton are reachable from an initial state and lead to some final state. Such 
an reachability test can easily be performed in 0{n\n'^)] thus, this will not 
breach the time bounds. 



4.2 Test 

We consider the case when Li or L2 is finite. In this case we are able provide a 
simple necessary and sufficient condition for the regularity of H^iLi, L2). 

Proposition 4.6. 

i.) If the language L{A) is finite, then 'H^{Li, L2) is regular. 

ii.) If the language L{A) is infinite and either Li is finite or L2 is finite, 
then ^-[^{Li, L2) is not regular. 



Proof. Statement i.) follows directly by Lem. 14.21 
For ii.) let L{A) be infinite. There is a path 

I^A^A^F 

in A where / is an initial bridge, F — ((di, (^2), ei, 62) is a final bridge, and 
A^A is a non-trivial loop (by non-trivial we mean v ^ 1). Note that A is on 
level and hence \w\ > k. Let a be the suffix oiw of length n and let /3 be a word 
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from the language B{di,d2, ei, 62). We have tt^ — uv^wfiWv'^u G 'Hk.{Li,L2) for 
aU « > 0. Moreover, if a prefix of tt^ belongs to Li, it is a prefix of uv^wPa and 
if a suffix of -Ki belongs to L2, it is a suffix of aPwv^u. 

By contradiction, assume 'Hk{Li^L2) is regular and Li is finite. Let j > 1 
such that the power v-' is idempotent in the syntactic monoid of 'Hi^{Li, L2), 
hence 

for i > 1. We consider t to be huge. More precisely, we assume that tt is at least 
twice as long as the longest word in Li and that v^'^ covers more than half of tt. 
The longest suffix of tt that belongs to L2 is still a suffix of afiwv^u which is far 
too short to build the hairpin; hence a prefix from Li has to build the hairpin 
and it has to cover more than half of tt — a contradiction. By a symmetric 
argument L2 is infinite, too. □ 

We check this property. Although, strictly speaking. Test is redundant for 
the general case. 

Test 0: Decide whether or not L{A) is finite. If it is finite, then stop with the 
output that L2) is regular. If it is not finite but Li or L2 is finite, then 

stop with the output that 'Hk(Li, L2) is not regular. 

In case when Li = % or L2 — % the time complexity follows by the next 
lemma as in these cases we can consider ni = 1 or 712 = 1, respectively. 

Lemma 4.7. Test can be performed in time 0{n\n\). 

Proof. Recall that every state in A is reachable and co-reachable, by Rem. 14.51 
The language L{A) is infinite if and only if A contains at least one non-trivial 
loop A — ^ A By the well-known algorithm of Tarjan [26| we can decompose 
a directed graph (as well as a finite automaton) into its strongly connected 
components in linear time with respect to the number of transitions. As the 
automaton A has 0{n\n^) transitions, this yields the time complexity. □ 



4.3 Test 1 

By Test 0, we may assume in the following that A accepts an infinite language 
and that the set S of non-trivial strongly connected components of the automa- 
ton A has been computed. Every non-trivial strongly connected component is 
on level and, moreover, as A accepts an infinite language, there is at least 
one. For s G 5' let Ng be the number of states in the component s. Note that 
^^gg Ng < N. By putting some linear order on the set of bridges, we assign to 
each s d S the least bridge Ag and some shortest, non-empty word Vg such that 
Ag > Ag. 

The next lemma tells us that for a regular hairpin completion "H^lLi, L2) 
every strongly connected component s € 5 is a simple cycle, and hence, the 
word Vg is uniquely defined. 
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Lemma 4.8. Let the hairpin completion 'Hk{Li, L2) be regular, s € S be a 
strongly connected component, and Ag F be a path from Ag to a final bridge 
F. Then the word w is a prefix of some word in vf . 

In addition, the word Vg is uniquely defined and the loop Ag — ^ Ag visits 
every other bridge B £ s \ {A} exactly once. Thus it forms a Hamiltonian cycle 
of s and \vg\ = Ng. 

Proof. Let A — Ag and v — Vg. Consider a path labeled by w from A to a final 
bridge F = ((di, ^2), ei, 62, fc). As all bridges are reachable, we find a word u 
and an initial bridge / such that 

I^A^A^F 

As the automaton A accepts uv^w for all i > 0, we see that uv^wfiwv'^u G 
'Hk.{Li,L2) for all i > and all (3 e 5(^1,^2,61,62). As V.k{Li,L2) is regular, 
there are j > 1 and k > \uj(3\ such that uv^^wfiwv^u e Hk,{Li, L2), by pumping. 
Due to the definition of A, the longest suffix of tt belonging to L2 is a suffix of 
ajSwv^u, where a is the suffix of w of length k, and this suffix is too short to 
create the hairpin completion. This means that the hairpin completion is forced 
to use a prefix in Li and that has to be a prefix of uv^^wfia. Therefore, the 
suffix Wv^u is complementary to a prefix of uv^'', whence w must be a prefix of 
yiik-i) (^ggg Fig. [5]) and, thus, concludes the first statement of our lemma. 



u w (3 w 



Figure 5: The hairpin of tt (Read the upper part from left to right and the lower 
part from right to left). 

Recall that A yl is a shortest, non-trivial loop around A; hence < Ng 
is obvious. Let B G s \ {A} and x = X1X2 such that A B A. For 
some i,j > 1 we have luH = \x^\. Thus, = x^ by the first statement. By the 



unique-path-property stated in Lem. 14.31 we obtain that the loop A — > A just 
uses the shortest loop A — % A several times. In particular, B is on the shortest 
loop around A. This yields \v\ > Ng and hence the second statement. □ 

Example 4.9. In the example given in Fig.|4]the state (Qoj ^ij ^2, 0) forms the 
only strongly connected component and the corresponding path is labeled with 
a. As one can easily observe, the automaton A satisfies the properties stated in 
Lcm. 14.81 (even though the hairpin completion is not regular) . 



The next test tries to falsify the property of Lem. 14.81 Hence it gives a 
sufficient condition that ^^(^1,^2) is not regular. 

Test 1: Decide whether there is s G S and a path Ag F such that w is not 
a prefix of a word in . If there is such a path, then stop with the output that 
'Hi^{Li, L2) is not regular. 
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Lemma 4.10. Test 1 can he performed in time 0{N'^). 

Proof. For s d S, let A — Ag and compute a shortest non-empty word v such 
that A A. If l^l 7^ Ns, stop with the output that ^^(^1, L2) is not regular. 
Otherwise, assign to each bridge that is reachable from A a subset of marks from 
{0,...,A^s — !}■ A mark i is assigned to a bridge B it B is reachable from A with 
a word from w*u[l, i]. Test 1 yields that ^^(^ij L2) is not regular if and only if 
there is a bridge that is marked by i and that has an outgoing a-transition where 
a v[i + I]. The marking algorithm can be performed by a depth- first search 
that runs in time 0{N ■ N^). Summing over all strongly connected components 
we deduce a time complexity in O {J2seS ^ ' ^s) ^ ©(iV^). □ 

4.4 Test 2 and 3 

Henceforth, we assume that Test 1 was successful (i. e.. Test 1 did not yield that 
'Hk.{Li,L2) is not regular). We fix a strongly connected component s € S oi A. 
We let A — As = ((pi,p2), ?i, 92, 0), we let v = u^, and we assume A A 
forms an Hamiltonian cycle in s. By u we denote some word leading from an 
initial bridge ((901, 902), 9i, 92' 0) A. (For the following test we do not need 
to know u we just need to know it exists.) The main idea is to investigate runs 
through the DFAs Ai and A2 where k,£>n according to Fig. [51 

k — —n 1 — * — 

T u V xy Z J X V V U r 

Li ■■ 9oi — > Pi — > Pi — > ci — > di — > ei =>qi =^ 9i =^ 9i 

-p — u X H , yx v^^ V* u / 

L2 ■ 902 > P2 > P2 > C2 > d2 !• 62 =^92 92 =^ 92 



Figure 6: Runs through Ai and A2 based on the loop A — !• A. 

We investigate the case when uv'^xyzxv^u G Hk{Li, L2) for all k > £ and 
where (by symmetry) this property is due to the longest prefix belonging to Li. 

The following lemma is rather technical. However, the notations are chosen 
to fit exactly to Fig. \6\ 

Lemma 4.11. Let x,y,z G S* be words and (^1,^2) G Qi x Q2 with the 
following properties: 

1.) K < l^l < + K and x is a prefix of some word in . 

< |y| < |f| and xy is the longest common prefix of xyz and some word 
in v'^ . 

3.) z £ B{ci, C2, di,d2), where ci — pi ■ xy and C2 = P2 ■ x. 

4-.) qi — di ■xv"'^ and during the computation of di ■xv"'^ we see after exactly 
K steps a final state in J-i and then never again. 
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5.) q2 — d2 ■ yxv"'^ and, let 62 = d2 ■ yx, during the computation of 62 • w"^ 
we do not see a final state in T2- 

IfHuiLi, L2) is regular, then there exists a factorization xyzxv = ^SpSji where 
\5\ = K, and p2 ■ ^505 £ J-2 (which implies 6p6jlv*u C 

Proof. The conditions say that uv^xyzxv^u G 7ii^{Li, L2) for all k > £ > n. 
Moreover, by condition 4, the hairpin completion can be achieved with a prefix in 
Li, and the longest prefix of uv'^xyzxv^u belonging to Li is the prefix uv^xyza 
where a is the prefix of x of length k. 

li T-L^iLi, L2) is regular, then we have uv^xyzxv'^'^^u e 'Hk{Li,L2), too, as 
soon as k is large enough, by a simple pumping argument. For this hairpin 
completion we must use a suffix belonging to L2. For z — 1, this follows from 
\y\ < \v\. For 2 7^ 1 we use \y\ < \v\ and, in addition, that xya with a = z[l] is 
not a prefix of vx by condition 2. 

By 5 the longest suffix of uv^xyzxv^^^u belonging to L2 is a suffix of 
xyzxv^'^^u. Thus, we can write 

uv^ xy zxv^^^u = uv^xyzxvv^u = uv'^ iiS(3Sjlv'^u 

where SpSjlv^u € L2 and \5\ = k. We obtain xyzxv = ii505]l. As p2 = 902 • u 
and P2 — P2 ■ V, we conclude p2 ■ fJ.S(36 G J-2 as desired. (Recall that our second 
DFA A2 accepts I^.) □ 

Example 4.12. Let us take a look at Fig.|4]again. Let A = {Qo,ti,t2,0), v = a 
and u — 1. If we choose x — a, y = 1, z — b, and (di,d2) = {pi,P2) we can 
see that conditions 1 to 5 of Lem. r4.11l are satisfied but there is no factorization 
abaa = fiSlSSjl with |^| = k = 1 such that (702 • ^J■Sf3S ^ J^2- Hence, the hairpin 
completion is not regular. 

We perform Test 2 and 3 which, again, try to falsify the property given by 
Lcm. 14.111 for a regular hairpin completion. The tests distinguish whether the 
word z is empty or non-empty. 

Test 2: Decide the existence of words x,y G E* and states (^1,^2) G Qi x 
Q2 satisfying conditions 1 to 5 of Lem. 14.111 with z = 1 , but where for all 
factorizations xyxv = n505ji with |(5| = k wc have p2 ■ ijl505 ^ F2- If we find 
such a situation, then stop with the output that "HkC^i, L2) is not regular. 

Test 3: Decide the existence of words x,y, z G E* with z 1 and states 
{di,d2) G Qi X Q2 satisfying conditions 1 to 5 of Lem. I4.11[ but where for all 
factorizations xyzxv = fid/3Sjl with \5\ — k we have p2 ■ fi5/3S ^ J-2. If we find 
such a situation, then stop with the output that 'Hk.{Li, L2) is not regular. 

Before we analyze the time complexity of Test 2 an Test 3 we will prove 
that if languages Li and L2 pass the tests we described so far, then the hair- 
pin completion Hk^Li, L2) is regular. Thus, the properties given by Lem. l478l 
and Lem. 14.111 together are sufficient for the regularity of 'Hk.{Li, L2). The time 
complexity analysis of Test 2 and Test 3 can be found in Sect. 14.51 
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Lemma 4.13. Suppose no outcome of Tests 1, Test 2, and Test 3 is that 
'Hk{Li, L2) is not regular. Then the hairpin completion 'Hf^{Li, L2) is regular. 

Proof. Let tt g JiKiLi, L2). Write tt — jaf3aj such that 70; is the minimal 
gamma-alpha-prefix of tt and |a| = k. Therefore, either "faf3a G Li or al3aj S 
L2; we assume ^a/Sa S Li, by symmetry. In addition, we may assume that 
I7I > n'' (cf. Prop. l476l and Test 0). We can factorize 7 = uvw with \uv\ < n'^ 
and |w| > 1 such that there are runs as in Fig. [7] where /i € J^i. 

Li : qoi — > Pi — > Pi — > fi =^ qi =^ Qi =^ Qi 

L2 ■■ q02 > P2 > P2 > J2 92 92 =^ 92 



Figure 7: Runs through Ai and A2 for the word tt. 

We infer from Test 1 that wa is a prefix of some word in Hence, we can 
write waP = v^xyz with i > such that v^xy is the maximal common prefix of 
wajS and some word in u"*", wa € v*x with k < < -I- k, and \y\ < \v\. 

We see that for some fc > £ > we can write 

TT = uv^xyzxv^u. 

Moreover, uv'^xyzxv^u G TiniLi, L2) for all fc > ^ > 0. There are only 
finitely many choices for u, v, x, y (due to the lengths bounds) and for each of 
them there is a regular set Rz associated to the finite collection of bridges such 
that 

TT e {uv^xyRzXv^u I fc > ^ > 0} C is)- 

More precisely, we can choose Rz — {1} for z = 1 and otherwise we can 
choose 

Rz e {i3(ci, C2, di, c?2) n aS* I (ci, C2, di, ^2) is a bridge and a e S} . 

Note that the sets {^uv^xyRzXv^u | /c > ^ > O} are not regular in general. If 
we bound however ^ by n, then the finite union 

[J [uv^xyRzXv'^u I fc > ^} 

0<l<n 

is regular. Thus, we may assume that £ > n. Let 62 = P2 ■ xzyx. We have 
62 ■ = q2 and if we see a final state during the computation of 62 • w", then 
for s\\ £ > k > n and z € Rz we see that uv^xyzxv^u S 'Hk.{Li, L2), due to a 
suffix in L2 and 

uv^v'^ xyRzXv^v^u C T-Ik{Li, L2). 

Otherwise, Test 2 or Test 3 tells us that for all z £ Rz the word xyzxv has a 
factorization /iSuSji such that |(5| — k and p2 • ^iSlyd G J^2. The paths qo2-u — p2 
and P2 ■ V — p2 yield Si'Sjlv*u C ^2 and, again, 

uv'''v+xyRzXv+v"u C n4Li,L2). 
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Hence, the hairpin completion Hk^Li, L2) is a finite union of regular lan- 
guages and, therefore, regular itself. □ 



4.5 Time Complexity of Test 2 and Test 3 

In this section we provide the final step of the proof of Thm. 14.11 We show that 
Test 2 can be performed in time 0{N^) and that Test 3 can be performed in 
time 0{ni2n\'n'^n). Thus, in case when Li = L2 both tests run in 0{n^) and in 
general Test 2 runs in 0{iiF') and Test 3 runs in 0{'n7). 

Test 2: Decide the existence of words x,y ^ S* and states (di, ^2) G Qi x Q2 
satisfying 

1. ) fc < |a;| < + k and a; is a prefix of some word in 

2. ) < |y| < |f I and xy is a prefix of some word in w^, 

3. ) di = pi ■ xy and ^2 = P2 • x, 

4. ) qi = di - xv"^ and during the computation of di ■ xv"^ we see after exactly 

K steps a final state in J-i and then never again, and 

5. ) (72 = c?2 • yxv^^ and, let €2 — d2 ■ yx, during the computation of 62 • w"^ 

we do not see a final state in F2 

but where for all factorizations xyxv = iid/SSjl with \S\ = k we have p2 ■ fJ.S(3S ^ 
J-2. If we find such a situation, then stop with the output that H^^Li, L2) is 
not regular. 

Lemma 4.14. Test 2 can be performed in time 0{N'^). 

Proof. For a strongly connected component s G S" with As = {{pi , P2) , qi , q2) 
and Vs = V, we have to compute all words x and y such that there are runs 

Pi — > di — > qi, P2 — > d2 — > (72 

and the conditions 1 to 5 are satisfied. In addition, we demand that during the 
computation of ^2 ■ yxv"^^ we do not meet any final state in T2 after more than 
K — 1 steps. (In case such a final state exists, either condition 5 is breached 
or a factorization xyxv = ^Sf3Sji with \S\ = k and p2 ■ ^S/SS € J-2 exists.) By 
backwards searches in Ai and A2 starting at states gi and q2, respectively, 
and searching for paths labelled by suffixes of u+, we compute all pairs {x,xy) 
satisfying these conditions in time 0{N ■ Ng). 

At this stage we also compute the position £{x,xy) of the last final state 
during the run p2 ■ vxyx and we let i{x,xy) = if no such state exists. Note 
that < £{x, xy) < Ns + \x\ + k. If a factorization xyxv = iiSpSjl with \S\ = k 
and p2 • fi6l3d G T2 exists, then |a;yiri;| — £{x, xy) gives us a lower bound for the 
length of II. 
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Let m{x, xy) be the length of the longest /i such that a factorization xyxv = 
Ii505jl with \5\ = K exists (without the condition p2 ■ ^J-S/SS S J-2). 

There is a factorization xyxv = nS^Sjl with |(5| = k and p2 ■ fiSfiS E J-2 if 
and only if m{x, xy) > \xyxv\ — £{x, xy) and £{x, xy) — k> \xyxv\ /2. 

We need to precompute the values m{x, xy) efRciently, which turns out to be 
a little bit tricky. For < i < A^s we let Vi = v[i + 1, A^s]i;[l, i] be the conjugate 
of V starting at the {i + l)-st letter. We wish to match position in vf with 
positions in w^. For each < j < Ns we store the maximal k < Ns such that 
''^i bi j + k] = [j, j + k] in a table entry M{i, j), see Fig. HI For each i one run 
(from right to left) over the words vf and v'^ is enough. It takes 0{N'^) time 
to build the table M. Now, if we know the length m' of the longest common 
prefix of and xv, then m(x, xy) — Ixyl+m' ~ k (yet at most |a;?/xw| /2 — k). 
The length of m' is stored in M{\xyx\ mod Ng, (— |ir|) mod Ng), hence we have 
access to 771(2;, xy) in constant time. 
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Figure 8: Matching positions of vf with v"^. 



AU in all Test 4 can be performed in 0{J2ses ^ ■ N,) C 0{N'^). □ 

Test 3: Decide the existence of words x,y, z e S* with z 7^ 1 and states 
(^1,^2) G Qi X Q2 satisfying 

1.) A; < |x| < |f I + K and a; is a prefix of some word in 

2-) < \y\ < \v\ and xy is the longest common prefix of xyz and some word 
in w+, 

3. ) z e B{ci, C2, di, c?2), where ci — pi ■ xy and C2 = P2 • x, 

4. ) qi ~ di- xv^^ and during the computation of di ■ xw""^ we see after exactly 

K steps a final state in Ti and then never again, and 

5. ) (72 = ^2 • yxv"^ and, let 62 = c?2 ■ yx, during the computation of 62 • v"^ 

we do not see a final state in J-2 

but where for all factorizations xyzxv = nSpSfl with |i5| = k we have p2 ■ p.S/35 ^ 
J-2. If we find such a situation, then stop with the output that 'H„(Li,L2) is 
not regular. 

Lemma 4.15. Test 3 can he performed in time 0{ni2n\n'^n). 

Proof. For s £ S with As — ((pi,P2), 9i, 92) and Vs = we create two tables 
Ti and T2- The table Ti holds all pairs (c2, di) £ Q2 x Qi such that a word x 
exists with 
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1. ) K < |x| < |f I + K and a; is a prefix of a word in f"*", 

2. ) P2-X = C2, 

3. ) di ■ xv""^ — qi, and during the computation of di ■ xu"^ we see a final state 

after exactly k steps and then never again. 

We call X a witness for {c2,di) G Ti. The table T2 holds all triples {ci,d2,a) G 
Si X Q2 X S such that a proper prefix y' < v exists with 

1. ) y'a is no prefix of u, 

2. ) pi • J/' = ci, 

3. ) d2 ■ y'v"^ — q2, and during the computation of ^2 • y'v'^' we do not see a 

final state after k or more steps. 

We call y' a witness for (ci ,d2,a) e T2 . By backwards computing in the second 
component, the tables Ti and T2 can be created in 0{Nsni) and 0{Nsn2), 
respectively. 

We claim that Test 3 yields that ^^(-^1,-^^2) is not regular if and only if there 
exists a pair {c2,di) G Ti and a triple (ci,(i2,a) G T2 such that (ci, C2, di, ^2) is 
an a-bridge. Recall that the list of a-bridges is precomputed. 

First, assume (c2,di) G Ti, (ci,(i2,a) G T2, and (ci, C2, di, ^2) is indeed an 
a-bridge. Let x and y' be the the witnesses for (c2, di) GTi and (ci, ^2, a) GT2, 
respectively. Choose z G i3(ci, C2, rfi, ^2) H aE* and i/ such that xy is a prefix 
of some word in w+, \xy\ = \y'\ (mod |u|), and |y| < Verify that x, y, z and 
((^1,^2) satisfy the conditions 1 to 5 of Test 3. However, for any factorization 
xyzxv = jidfiSji with \5\ = k, the word ijl5 has to be a prefix of xy, since xya is 
no prefix of vx. During the computation of d2 ■ y'v"^^ we did not see a final state 
after more than k — 1 steps. The same holds for the computation of ^2 • yxv"^ 
and, therefore, we have p2 • fJ.S/3S ^ T2. 

Now assume that x,y,z G E*, z ^ \, and {di,d2) G Qi x Q2 exist, which 
satisfy the conditions 1 to 5 of Test 3 but where for all factorizations xyzxv = 
11805]! with \5\ — Kwe have p2 ■ ^5[i5 ^ ^2- Choose y' < v such that \xy \ = \y'\ 
(mod \v\). Let C2 = P2 ■ x, ci = pi ■ y' and a S S be the first letter of z. 
Obviously, (ci, C2, di, (^2) is an a-bridge and x is a witness for (c2,(ii) G Ti. If 
we saw a final state after more than k — 1 steps during the computation of 
d2 ■ y'u"^, then a factorization xyzxv = iiSpSjl where \S\ = k and p2 ■ fJ,6/35 G J-2 
would exist. Thus, y' is a witness for (ci,d2,a) G T2. 

Since the table of a-bridges is precomputed (see Lem. 14. 4p , this test can be 
performed in time OdTij • IT2I). The set of all first components of Ti (respec- 
tively, T2) is bounded by both, the size A''^ and 712 (respectively, ni). Therefore, 
we have jTil G ©(rti •min(7Vs, 71,2)) and \T2\ G 0{n2-inhi{Ns,ni)). By symmetry, 
assume n2 < ni. 
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Test 3 can be performed in time 

Vses / 
O ni2n\n2 + ni2nin\ + ^ n\n2 + ^ N'^nin2 

\ seS,Ns>n2 seS,N, <n2 ) 

(Recall that n\ <n < n\2 < ?t-i'^2 < and X^ses Ng < N ^ ni2nin2.) 

Since there are at most ni2ni strongly connected components with a size of 
712 or more states, 

nfn2 < ni2n\n\. 
For the last term we can use the approximation 



2„3 



Ngnin2 < Nsnin2 < nunf 

We conclude, Test 3 can be performed in time 0{ni2n\n2n). □ 



5 Rational Growth 

Let L'l = ii n UaGS« and = L2 n [j^^^^ aY.*aY,* . Obviously, 

HKiL'i, L'2) = 'Hk{Li,L2). Thus, the growths of ^^(ii, -^2) should be compared 
with the growths of L[ and L'2 rather than with the growths of Li and L2. 
The languages L[ and L'2 are still regular and we can compute their growths. 
However, to simplify the notation, it is more convenient to assume from the very 
beginning that Li and L2 contains only words that can form hairpins. Formally, 
we assume throughout this section that 

Li C (J I]*aS*a, ^2 C (J aS*5S*. 

Remember (Sect. 12. 3p that the growth indicator of a language L says 
that \L n S™| behaves essentially as A™. 

Theorem 5.1. Let A max{ALj,AL2} be the maximum growth indicator of 
Li and L2, and let 77 he the growth indicator o/'Hk(£i, ^2)- 

i.) The value lies within 

VX<ri<X. 

In particular, the growth ofHi^lLi, L2) is exponential (respectively, poly- 
nomial, finite) if and only if the maximum growth of Li and L2 is expo- 
nential (respectively, polynomial, finite). 

ii.) If'Hi^{Li,L2) is regular, then we have rj = X. Thus, the growth indicator 
of "HtiiLi, L2) is the maximum growth indicator of Li and L2. 
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The theorem wih follow by Lein. l5.3] and Lem. l5.4] in Sect. l5.2l which compare 
the growth indicators A and 77 with the growth indicators of the languages 
Bfj^ and i?^ for fj, e M. Before we can prove theses lemmas, we need some 
preliminary observations on growth indicators of (regular) languages. 

5.1 Basic Facts about Growth Indicators 

Consider two languages Ki and K2. It is well known that the growth indicator 
of their union is XkiuK2 — max{Ai<-j, A^^}. Furthermore, ii Ki ^ <!l ^ K2 the 
growth indicator of their concatenation is XkiK2 = max{Ai<-j, Ajj-^}, too. 
Now, let K he a. regular language. The prefix closure of K is defined as 

Pref (K) ^{ueT.* \ 3veJ:*:uve K}. 

The next lemma shows that the growth indicators of K and its prefix closure 
coincide. Note that this does not necessarily hold if K is (unambiguous) linear. 

Lemma 5.2. Let K be a regular language, then \k = Aprcf(_ff). 

Proof. As K C Pref(if), the inequation \k < Apicf(A') is obvious. 

Conversely, let fc be a constant such that K is accepted by a DFA of size k 
and let m e N. For a word u E Pref(A') n S™, there is some word v such that 
uv € K and, moreover, we may assume \v\ < k. Let /i be a mapping h: ui-^ uv 
for u G Pref(A') n E™ such that uv E K and \v\ < k. Note that h is injective 
(for a fixed to). Thus, we see that 

rn+k 

|Pref(A')nS'"| < ^ |A:nE^| . 

i—m 

For all u > Xk there exists c such that |irn5]"^| < cu'^^ for all m € N. 
Therefore, 

m-\-k 

|Pref(A:) n E"! < ^ ci^* < c{k + l)v''v"'. 

i—m 

We conclude \pT:ci{K) ^ ^ a-nd as such Api.cf(A') = ^k- D 

5.2 Proof of Theorem 

Recall from Lem. 14.21 that the hairpin completion is the disjoint union 

H,(Li,L2)= U B^-. 

tJ,eM 

We let and be the growth indicators of B^^ and i?^;, respectively. By 
a — max {cr^ | € M} and p = max {p^ \ p G M} we denote the maximum 
growth indicators of all i?^ and all R^^ , respectively. The next lemma compares 
the growth indicator A with the growth indicators a and p. 
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Lemma 5.3. A = max{a, p} . 



Proof. We start by proving A > max{a, p}. Let p G M he fixed. For ja € i?^, 
with \a\ = K, and (3 £ i?^ either jal3a £ Li or a^aj £ L2. Thus, we may 
define a mapping h: {R^ x _B^) — Li U L2 such that 



Obviously, \^a\ + |/3| = |/i(7Q!,/3)| — k. Also note that a word u> e Li U L2 of 
length m can form less than 2m hairpin completions. Therefore, the cardinality 
of the inverse image is |/i^^(u))| < 2m. Using the mapping h, we can compare 
the growth r„ = n S™] with the growth = \{Li U L2) n S"]; that is 

?'m < 2(m + k) • ^m+K for m G N. 

For 1/ > A = Ai^uL2 we chose v' from the open interval (A, v). There exists 
c' > such that < 2(m + hi)c'v"^v''^ for all m € N and, as the function z/™ 
growth faster than i^'™, there is some c > such that r™ < cz/™ for all to e N. 
Therefore, max{crp,/9^} < v for all > A, whence max{crp,/9^} < A. As this 
inequation holds for all p £ M, we deduce A > maxja, p}. 

Conversely, we will prove that Li is included in a language K whose growth 
indicator is maxjcr, p}. As there is a symmetric language that includes L2, 
this yields A < maxjcr, p}. Let B = U^ga/^p and R — U^iGA/^M- ^ 
be the prefix closure K = Pref As the growth indicator of RBY,'^ is 

A_R,_Bs~ = maxjff, p} and by Lem. [5?2l we deduce Aj^ = max{cr, p}. 

Now, consider w £ Li. By assumption, w can form a hairpin on its right 
side. We let tt e 'Hk({w},0) be a hairpin completion of w. Let 7a be the 
minimal gamma- alpha-prefix of tt with \a\ = n and /3 such that tt = ja^aj. 
Note that w has to be a prefix of ja/3a £ RBY.'^ (by the minimality of I7I). 
Thus, we may conclude Li C K a,s desired. □ 

Now, let us compare the growth indicator 77 with the growth indicators a 
and p. 

Lemma 5.4. 77 ~ max {a, y/p} . 

Proof. Let be the growth indicator of for /i e M. Since 'Hf^{Li, L2) = 
U/iGM-^A'''' ^-'^^^ ^ ~ max{T^ I p £ A/}. Thus, in order to prove the 

claim, it suffices to show that = max {cr^, ^/P/l} for p £ M . Let p £ M he 
fixed from here on and recall that and i?^ arc non-empty. We let 




7q;/3q; if 7a/3a € Li 
a/3a7 otherwise. 



9B^{z) 



with fe„ = |s^nE"|, 



m>0 



m 



,m 



with r„ = |i?^ns"|. 



m>0 



It will be convenient to let 7'i+i/2 = for i G N. 
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First, let us prove > a^. Let v G and consider K = vB^v. Obviously, 
K CBfj,^ and hence Ti_i > Xk = cf^. 

Next, we prove > Let K = {/3}^" C B^" for some P e B^. 

The generating function of K is given as gxiz) — X]m>o ''(™-l/3|)/2'^™- I^*^^ ^ 
V > Xk there exists c > such that 

Vm e N: r(„,_|^l)/2 < ci/'" Vm G N: r™ < ci^l'''(i/2)™ 

and, therefore, i/^ > /O^. We conchide > Xk > 

Finally, we need to prove < max {cr^, ^/pJI}■ As i?;^^ is unambiguous, by 
Lem.E^ 

ggRf.{z) = ^ dn^z" with dm = ^ 6fc?'f/2- 

m>0 m 

For 1/ > max {cr^, y^p^} we choose v' from the open interval (max {cr^ , ^/p^} , i') ■ 
By that choice, grows faster than v'™ and there is c' > such that for all 
m g N and fc + ^ = m, we have hkri/2 < cV™. Thus, there is c > such that 
for all m G N, the inequality dm < mcV™ < cz^'" holds. This deduces the last 
step in the proof, < max {cr^, -y/p^} ■ □ 

Lem. 15.31 and Lem. 15.41 yields a development of the growth indicators A and 
T] as shown in Fig. [SI 




pa ^ a 



Figure 9: Growth indicators A and 7] in dependency of cr and p. 

It is easy to see that 77 is at least \/X and at most A and, therefore, we deduce 
the first statement of Thm. 15.11 The second statement of Thm. 15.11 claims that 
if the hairpin completion is regular, then X = rj. In case when 'H^{Li^ L2) is 
regular, we infer from Lem. l4?8l that if the hairpin completion of 'Hf^{Li^ L2) is 
regular, then the growth of all is polynomial (more precisely, linear) or finite 
(i. e., p = 1 or p = 0). We conclude A = max {cr, p} = max {cr, = r/. 

Final Remarks 

We proved that regularity of a hairpin completion of regular languages is de- 
cidable in polynomial time. Considering the two-sided hairpin completion of 
regular languages, the decision algorithm, we presented, can be performed in 
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time 0{n^) (respectively, 0{nP) in case when Li = L2) which, at first, seems to 
be a high degree for a polynomial time algorithm. However, the first step of the 
algorithm is the construction of an automaton A which is already of size 0{n'^) 
(respectively, 0{n^)). Thus, when speaking of time complexity with respect 
to the size of A, the algorithm uses quadratic time, only. Furthermore, as we 
take into account all pairs of states of A, the time bound seems optimal for 
this approach and further improvement of the time complexity would probably 
call for a completely new approach. For the one-sided hairpin completion of a 
regular language, we provide a faster algorithm which runs in quadratic time. 

The polynomial time bounds are due to the fact that we use DFAs for the 
specification of Li and L2. We do not know what happens if ii and L2 are 
given by NFAs. We suspect that deciding regularity if ^^(^1, -^2) might become 
PSPACE-complete. But this has not been investigated yet. 

By our second result, that the hairpin completion of regular languages is 
always an unambiguous linear language, we are able to effectively compute the 
growth function of the hairpin completion. Moreover, we showed that the hair- 
pin completion has an exponential growth if and only if one of the underlying 
languages has an exponential growth (given that every word from the underly- 
ing languages can form a hairpin). More precisely, the growth indicator of the 
hairpin completion is at most as large as the maximum growth indicator of the 
underlying languages and at least as large as its square root. In case when the 
hairpin completion is regular, we provided an even stronger relationship between 
the growth indicators. In that case, the growth indicator of the hairpin comple- 
tion coincides with the maximum growth indicator of the underlying languages. 
Our results about growths are trivial in case that Li and L2 have polynomial 
growths. However, the structure of regular languages with polynomial growths 
is well- understood [IS] (Sect. 12. 3p . We believe that a study of hairpin comple- 
tions for this class of regular languages might lead to interesting results. We 
leave this to future research. 

Another interesting problem concerns the hairpin lengthening of regular 
languages, which is an operation familiar to the hairpin completion. We call 
71 a/3572 a (right) hairpin lengthening of jia/Sa if 72 is a sufHx of 71 and we 
call it a (left) hairpin lengthening of af3aj2 if 71 is a prefix of 72. The hairpin 
lengthening ^£^(^1,-^^2) of languages Li and L2 is introduced analogously to 
the hairpin completion. It is known that the hairpin lengthening of regular lan- 
guages is linear, but in contrast to the hairpin completion it is not unambiguous, 
in general, see [15j . This might indicate that deciding regularity of the hairpin 
lengthening 'HCk.{Li,L2) is more difficult than for the hairpin completion. To 
date it is not known whether regularity of HL^iLi, L2) is decidable. 
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